📝 Objective: Perform the analysis of Metagenome-Assembled Genomes (MAGs) recovered from a public dataset using an end-to-end pipeline.
The pipeline used to build the MAGs was nf-core/mag, which integrates several tools to assemble the sequences and polish the recovered MAGs. The usage of the pipeline can be checked here, and below is the workflow it follows:
Since many of the downstream processes demand high computational resources, we have pre-computed some of them for you. However, we will explain step-by-step what we did or we are going to provide with the references to review what’s happening under the hood.
MAGFlow is a tool designed to combine several tools to measure the quality of the bins/MAGs, as well as to taxonomically annotate them. This is the workflow:
The output of this tool is a ready-to-use file that concatenates the
results, and it can then used as input for BIgMAG. We will not
execute MAGFlow today, and hence you will find the
final_df.tsv to display the BIgMAG dashboard. Let’s create
then the environment for BIgMAG by running the following commands on the
GitHub Codespace terminal:
Bash
git clone https://github.com/jeffe107/BIgMAG conda create -n BIgMAG --file BIgMAG/requirements.txt conda activate BIgMAG
⚠ WARN: Whenever you are asked whether to install extra packages, please say yes to all!
Now, we are ready to execute BIgMAG to perform the exploratory analysis of the MAG overall quality and annotation:
Bash
BIgMAG/app.py -p 8050 data/magflow/final_df.tsv
You will see on the terminal a link to the dashboard or the editor will offer you an option to directly open a new tab on the browser, just click on it.
Now, it is your turn to analyze the results, use these questions to guide your thoughts:
❓ Question: Overall, the MAGs recovered within your assigned samples have good quality? What would you suggest to improve the quality of the bins that do not depict enough quality?
❓ Question: The results displayed by BUSCO and CheckM2 are correspondant? Why do you think this is happening?
❓ Question: What about the taxonomical classification, what would you report in terms of comparison among samples?
❓ Question: Were the samples clustered as you expected? What caused this behavior?
❓ Question: Is there any unsual sample or MAG that catches your attention? What further analysis you would propose to follow in order to dig into this special sample or MAG?
⚠ WARN: Do not forget to stop the dashboard with Ctrl + c, and to deactivate the environment with:
conda deactivate
Moving forward, it is time now to identify genomic features within the bins/MAGs, KEGG Decoder is a tool that interprets the metabolic potential of the MAGs (or any genome in general) by analyzing the presence of KEGG Orthology (KO) identifiers. It maps these KOs to major metabolic pathways and summarizes the completeness of each pathway based on predefined modules. Being so, we can infer the metabolic capabilities of the community, and gain insights into functional differences across samples or MAGs.
Prokka (included within nf-core/mag) has provided us with the annotation about the presence/localization of enzymes in the bins/MAGs that are involved in a wide variety of processes. Nonetheless, Prokka uses EC numbers to describe such genomic features, and as a result, we needed to transform these annotations to K numbers and merge them into a single file (this was already done for you). Now, let’s install KEGG decoder:
Bash
conda create -n keggdecoder python=3.6 conda activate keggdecoder python3 -m pip install KEGGDecoder
⚠ WARN: Whenever you are asked whether to install extra packages, please say yes to all!
Now, we are ready to execute this software to determine the presence and completeness of the annotated metabolic pathways:
Bash
KEGG-decoder --input data/k_numbers/megahit_k_numbers.tsv --output kegg_output.tsv --vizoption static
This command will create a heatmap where you can perform a comparison across MAGs. In this case, we are analyzing the MAGs obtained with the assembler MEGAHIT. It should look like this:
On the y axis you will see just random letters per row, these are the corresponding names Prokka has asigned to the genomic features in the MAG. Below you can find a mapping file between the name of the bin/MAG and the contig name:
| Bin | ContigName |
|---|---|
| MEGAHIT-MaxBin2-ERR2143759.001.tsv | GIFAEIHF |
| MEGAHIT-MaxBin2-ERR2143759.002.tsv | CIJAKMNO |
| MEGAHIT-MaxBin2-ERR2143759.003.tsv | IACMDEGO |
| MEGAHIT-MaxBin2-ERR2143759.004.tsv | FKGEEGMK |
| MEGAHIT-MaxBin2-ERR2143759.005.tsv | NBLBKFGP |
| MEGAHIT-MaxBin2-ERR2143760.001.tsv | CDMCPOJD |
| MEGAHIT-MaxBin2-ERR2143760.002.tsv | PJEJNNEC |
| MEGAHIT-MaxBin2-ERR2143760.003.tsv | ELBPOCGG |
| MEGAHIT-MaxBin2-ERR2143760.004.tsv | LPDNOCIE |
| MEGAHIT-MaxBin2-ERR2143771.001.tsv | KEBGIGOI |
| MEGAHIT-MaxBin2-ERR2143771.002.tsv | GNGAKHBD |
| MEGAHIT-MaxBin2-ERR2143772.001.tsv | NGILDPFJ |
| MEGAHIT-MaxBin2-ERR2143772.002.tsv | BNBNNLFK |
| MEGAHIT-MaxBin2-ERR2143773.001.tsv | OLADMAGF |
| MEGAHIT-MaxBin2-ERR2143773.002.tsv | EPNPGCJF |
| MEGAHIT-MaxBin2-ERR2143773.003.tsv | OGHEFGBH |
| MEGAHIT-MetaBAT2-ERR2143759.1.tsv | EOMIBOKB |
| MEGAHIT-MetaBAT2-ERR2143759.2.tsv | IMFCMJNE |
| MEGAHIT-MetaBAT2-ERR2143759.3.tsv | JPIOBONM |
| MEGAHIT-MetaBAT2-ERR2143759.4.tsv | IIKMGHIJ |
| MEGAHIT-MetaBAT2-ERR2143759.5.tsv | FJMNHDHH |
| MEGAHIT-MetaBAT2-ERR2143759.6.tsv | HNFOOLAK |
| MEGAHIT-MetaBAT2-ERR2143759.7.tsv | KKNACEHJ |
| MEGAHIT-MetaBAT2-ERR2143760.1.tsv | JMOHPCEH |
| MEGAHIT-MetaBAT2-ERR2143760.2.tsv | JNLKELBO |
| MEGAHIT-MetaBAT2-ERR2143760.3.tsv | BPOICHMO |
| MEGAHIT-MetaBAT2-ERR2143760.4.tsv | OJEIIDGO |
| MEGAHIT-MetaBAT2-ERR2143760.5.tsv | OINBHHLA |
| MEGAHIT-MetaBAT2-ERR2143771.1.tsv | EONPCBDF |
| MEGAHIT-MetaBAT2-ERR2143771.2.tsv | OJCBHCCG |
| MEGAHIT-MetaBAT2-ERR2143771.3.tsv | JADMGMAL |
| MEGAHIT-MetaBAT2-ERR2143771.4.tsv | KKGCJLIG |
| MEGAHIT-MetaBAT2-ERR2143772.1.tsv | KHPHFBJJ |
| MEGAHIT-MetaBAT2-ERR2143772.2.tsv | NECFOIJH |
| MEGAHIT-MetaBAT2-ERR2143772.3.tsv | KDCCMDHF |
| MEGAHIT-MetaBAT2-ERR2143772.4.tsv | FDKDGLCP |
| MEGAHIT-MetaBAT2-ERR2143773.1.tsv | IJLNKNIL |
| MEGAHIT-MetaBAT2-ERR2143773.2.tsv | MACHMHHK |
| MEGAHIT-MetaBAT2-ERR2143773.3.tsv | IGFKCGOH |
| MEGAHIT-MetaBAT2-ERR2143773.4.tsv | LLMAHDOM |
⚠ WARN: Do not forget to deactivate the environment with:
conda deactivate
Clusters of Orthologous Genes (COGs) are groups of genes from different organisms that evolved from a common ancestral gene and retain the same function. To explore these genes, we rely on the amino-acid sequences of the coding regions provided by Prokka (.faa files). The next task then will be to install a COGclassifier in order to detect Cluster of Ortholog Genes:
Bash
conda create -n cogclassifier -c conda-forge -c bioconda cogclassifier conda activate cogclassifier
⚠ WARN: Whenever you are asked whether to install extra packages, please say yes to all!
This tool automatically perform the processes from searching query sequences into the COG database, to annotation and classification of gene functions, to generation of publication-ready figures. However, the input for this tool is only one genome each time. This does not mean that we can not analyze all the bins/MAGs, we could simply execute the software for each of them and integrate the data afterwards. To launch the tool just run on the terminal:
Bash
COGclassifier -i data/faas/MEGAHIT-MetaBAT2-ERR2143759.7.faa -o cog_annotation --download_dir ./cog_database
Once it has finished, inside cog_annotation you will see
the output represented as tables featuring counts, summary and
annotations, as well as interesting figures showcasing the proportion of
the different COG categories, just like this:
❓ Question: What are the categories that are more repsresentative of this MAG? Do you see any odd results?
❓ Question: If you decide to analyze all of the MAGs, do you think it would be a fair comparison just using the raw counts obtained by the software? If not, what strategy would you propose to proceed further?
⚠ WARN: Do not forget to deactivate the environment with:
conda deactivate
CAZymes (Carbohydrate-Active enZymes) are enzymes involved in the breakdown, biosynthesis, or modification of carbohydrates and glycoconjugates. They play a crucial role in processing complex carbohydrates such as cellulose, hemicellulose, starch, and chitin, among others. To detect the presence of this kind of enzymes, we are going to use the tool dbCAN3, which is an automated web server designed to run the software and provide the results.
To execute the tool, you just need to download the file from
Codespaces
data/faas/MEGAHIT-MetaBAT2-ERR2143759.7.faa, upload it to
the server (click here), submit the job
and wait for the results.
It should look like this:
| GeneID | EC | HMMER | dbCAN_sub | DIAMOND | Signalp | NofTools |
|---|---|---|---|---|---|---|
| OJEIIDGO_00076 | None | GH2(51-929) | None | None | Y(1-22) | 1 |
| OJEIIDGO_00091 | None | GH20(151-512) | None | None | Y(1-20) | 1 |
| OJEIIDGO_00128 | None | GH57(38-327) | None | None | N | 1 |
| OJEIIDGO_00143 | None | CE1(19-254) | None | None | N | 1 |
| OJEIIDGO_00151 | None | GT35(270-635) | None | None | N | 1 |
| OJEIIDGO_00156 | None | GT41(135-688) | None | None | N | 1 |
| OJEIIDGO_00164 | None | CE1(30-279) | None | None | N | 1 |
| OJEIIDGO_00288 | None | GT83(81-370) | None | None | N | 1 |
| OJEIIDGO_00300 | None | CE20(100-275)+CE20(404-530) | None | None | N | 1 |
| OJEIIDGO_00303 | None | GH16_3(35-227) | None | None | N | 1 |
| OJEIIDGO_00310 | None | GH23(76-208) | None | None | Y(1-22) | 1 |
| OJEIIDGO_00341 | None | GH73(158-287) | None | None | N | 1 |
| OJEIIDGO_00345 | None | GH177(38-426) | None | None | Y(1-35) | 1 |
| OJEIIDGO_00393 | None | GH109(48-198) | None | None | Y(1-30) | 1 |
| OJEIIDGO_00403 | None | GT117(15-234) | None | None | N | 1 |
| OJEIIDGO_00418 | None | AA3(8-564) | None | None | N | 1 |
| OJEIIDGO_00421 | None | GT51(39-219) | None | None | N | 1 |
| OJEIIDGO_00582 | None | AA3(3-568) | None | None | N | 1 |
| OJEIIDGO_00608 | None | GT2(3-113) | None | None | N | 1 |
| OJEIIDGO_00632 | None | GH25(32-203) | None | None | Y(1-17) | 1 |
| OJEIIDGO_00633 | None | CE7(21-301) | None | None | N | 1 |
| OJEIIDGO_00690 | None | GH109(41-201) | None | None | N | 1 |
| OJEIIDGO_00711 | None | GT4(162-301) | None | None | N | 1 |
| OJEIIDGO_00726 | None | GT2(7-137) | None | None | N | 1 |
| OJEIIDGO_00790 | None | GH177(39-417) | None | None | N | 1 |
| OJEIIDGO_00879 | None | GH73(136-324) | None | None | N | 1 |
| OJEIIDGO_00903 | None | GH31_1(1-431) | None | None | N | 1 |
| OJEIIDGO_00905 | None | GH133(51-405) | None | None | N | 1 |
| OJEIIDGO_00933 | None | GH23(80-213) | None | None | N | 1 |
| OJEIIDGO_00967 | None | GT4(185-327) | None | None | N | 1 |
| OJEIIDGO_01003 | None | GH10(530-827) | None | None | Y(1-25) | 1 |
| OJEIIDGO_01012 | None | GH13_16(35-388) | None | None | N | 1 |
| OJEIIDGO_01015 | None | GT2(5-108) | None | None | N | 1 |
| OJEIIDGO_01016 | None | GT2(45-271) | None | None | N | 1 |
| OJEIIDGO_01038 | None | GT4(220-364) | None | None | N | 1 |
| OJEIIDGO_01075 | None | GH51_1(25-513) | None | None | Y(1-30) | 1 |
| OJEIIDGO_01100 | None | CE1(486-698) | None | None | Y(1-21) | 1 |
| OJEIIDGO_01133 | None | GH2(321-692) | None | None | N | 1 |
| OJEIIDGO_01214 | None | GH9(443-830) | None | None | Y(1-23) | 1 |
| OJEIIDGO_01247 | None | GT83(3-459) | None | None | N | 1 |
| OJEIIDGO_01282 | None | GT2(5-134) | None | None | N | 1 |
| OJEIIDGO_01283 | None | GT2(7-175) | None | None | N | 1 |
| OJEIIDGO_01317 | None | CBM13(1045-1182) | None | None | N | 1 |
| OJEIIDGO_01321 | None | GH16_3(163-394) | None | None | Y(1-25) | 1 |
| OJEIIDGO_01341 | None | GT2(46-189) | None | None | N | 1 |
| OJEIIDGO_01344 | None | GT4(195-336) | None | None | N | 1 |
| OJEIIDGO_01345 | None | GT4(195-350) | None | None | N | 1 |
| OJEIIDGO_01347 | None | GT4(199-345) | None | None | N | 1 |
| OJEIIDGO_01350 | None | GH188(391-622) | None | None | N | 1 |
| OJEIIDGO_01409 | None | GT2(5-133) | None | None | N | 1 |
| OJEIIDGO_01411 | None | CE14(44-155) | None | None | Y(1-24) | 1 |
| OJEIIDGO_01446 | None | CBM48(27-107)+GH13_9(166-463) | None | None | N | 1 |
| OJEIIDGO_01484 | None | GH179(12-247) | None | None | N | 1 |
| OJEIIDGO_01520 | None | CE14(6-111) | None | None | N | 1 |
| OJEIIDGO_01524 | None | GT4(194-342) | None | None | N | 1 |
| OJEIIDGO_01526 | None | GT2(73-301) | None | None | N | 1 |
| OJEIIDGO_01529 | None | GT2(6-165) | None | None | N | 1 |
| OJEIIDGO_01533 | None | GT30(45-208) | None | None | N | 1 |
| OJEIIDGO_01552 | None | GT2(2-82) | None | None | N | 1 |
| OJEIIDGO_01586 | None | CBM9(41-212) | None | None | N | 1 |
| OJEIIDGO_01612 | None | GT20(2-460) | None | None | N | 1 |
| OJEIIDGO_01621 | None | PL9_2(325-677) | None | None | Y(1-25) | 1 |
| OJEIIDGO_01631 | None | CBM48(29-114)+GH13_9(184-483) | None | None | N | 1 |
| OJEIIDGO_01720 | None | GH144(34-450) | None | None | Y(1-20) | 1 |
| OJEIIDGO_01721 | None | CBM102(60-174)+CBM102(214-332)+CBM102(366-483)+CBM102(521-644)+CBM102(682-814) | None | None | Y(1-24) | 1 |
| OJEIIDGO_01722 | None | GH144(119-520) | None | None | Y(1-22) | 1 |
| OJEIIDGO_01759 | None | GT83(4-374) | None | None | N | 1 |
| OJEIIDGO_01784 | None | GH140(19-442) | None | None | Y(1-19) | 1 |
| OJEIIDGO_01808 | None | GH171(55-398) | None | None | Y(1-32) | 1 |
| OJEIIDGO_01877 | None | GT2(6-116) | None | None | N | 1 |
| OJEIIDGO_01891 | None | GH88(50-438) | None | None | N | 1 |
| OJEIIDGO_01907 | None | GT9(65-311) | None | None | N | 1 |
| OJEIIDGO_01916 | None | GT5(5-224) | None | None | N | 1 |
| OJEIIDGO_01949 | None | GT4(193-345) | None | None | N | 1 |
| OJEIIDGO_02005 | None | GT2(329-477) | None | None | N | 1 |
| OJEIIDGO_02016 | None | GH74(112-202) | None | None | Y(1-23) | 1 |
| OJEIIDGO_02023 | None | GH179(26-250) | None | None | N | 1 |
| OJEIIDGO_02026 | None | GH177(24-384) | None | None | N | 1 |
| OJEIIDGO_02045 | None | GH18(21-280) | None | None | N | 1 |
| OJEIIDGO_02059 | None | GT51(90-270) | None | None | N | 1 |
| OJEIIDGO_02071 | None | GH3(108-332) | None | None | N | 1 |
| OJEIIDGO_02073 | None | GT55(39-418) | None | None | N | 1 |
| OJEIIDGO_02078 | None | GH73(34-168) | None | None | N | 1 |
| OJEIIDGO_02099 | None | GH179(58-232) | None | None | Y(1-32) | 1 |
| OJEIIDGO_02104 | None | GH109(3-191) | None | None | N | 1 |
| OJEIIDGO_02141 | None | GT10(68-264) | None | None | N | 1 |
| OJEIIDGO_02142 | None | GT8(3-241) | None | None | N | 1 |
| OJEIIDGO_02146 | None | GT4(223-366) | None | None | N | 1 |
| OJEIIDGO_02147 | None | GT2(7-132) | None | None | N | 1 |
| OJEIIDGO_02148 | None | GT2(4-163) | None | None | N | 1 |
| OJEIIDGO_02150 | None | GH43_1(17-328) | None | None | N | 1 |
| OJEIIDGO_02206 | None | CE9(4-255) | None | None | N | 1 |
| OJEIIDGO_02230 | None | CE1(49-268)+CE1(409-635) | None | None | Y(1-20) | 1 |
| OJEIIDGO_02231 | None | GH43_12(57-351)+CBM91(387-573) | None | None | N | 1 |
| OJEIIDGO_02246 | None | GH188(10-204) | None | None | N | 1 |
| OJEIIDGO_02276 | None | PL33(426-582) | None | None | Y(1-22) | 1 |
| OJEIIDGO_02278 | None | PL35(401-574) | None | None | Y(1-26) | 1 |
| OJEIIDGO_02307 | None | GH188(2-192) | None | None | N | 1 |
| OJEIIDGO_02319 | None | GT2(6-172) | None | None | N | 1 |
| OJEIIDGO_02326 | None | GT4(190-338) | None | None | N | 1 |
| OJEIIDGO_02336 | None | GT2(16-177) | None | None | N | 1 |
| OJEIIDGO_02382 | None | GT2(49-160) | None | None | N | 1 |
| OJEIIDGO_02415 | None | GT4(203-346) | None | None | N | 1 |
| OJEIIDGO_02445 | None | GH179(19-342) | None | None | N | 1 |
| OJEIIDGO_02450 | None | GT2(4-148) | None | None | N | 1 |
| OJEIIDGO_02458 | None | GT4(192-342) | None | None | N | 1 |
| OJEIIDGO_02469 | None | GH113(36-340) | None | None | Y(1-23) | 1 |
| OJEIIDGO_02492 | None | GT2(52-274) | None | None | N | 1 |
| OJEIIDGO_02512 | None | PL12(65-203) | None | None | N | 1 |
| OJEIIDGO_02519 | None | CE15(77-445) | None | None | Y(1-21) | 1 |
| OJEIIDGO_02581 | None | GT51(70-247) | None | None | N | 1 |
| OJEIIDGO_02606 | None | GT119(20-203)+GT119(266-469) | None | None | N | 1 |
| OJEIIDGO_02704 | None | GH109(7-164) | None | None | N | 1 |
| OJEIIDGO_02725 | None | GH103(41-335) | None | None | Y(1-31) | 1 |
| OJEIIDGO_02748 | None | GT4(158-302) | None | None | N | 1 |
| OJEIIDGO_02758 | None | GT28(192-351) | None | None | N | 1 |
| OJEIIDGO_02759 | None | GT119(32-381) | None | None | N | 1 |
| OJEIIDGO_02858 | None | CBM91(2-167) | None | None | N | 1 |
| OJEIIDGO_02880 | None | GT2(4-149) | None | None | N | 1 |
| OJEIIDGO_02920 | None | GH13_19(77-423) | None | None | Y(1-25) | 1 |
| OJEIIDGO_02940 | None | GT1(203-432) | None | None | N | 1 |
| OJEIIDGO_02953 | None | GH73(150-279) | None | None | N | 1 |
| OJEIIDGO_02972 | None | CE7(131-405) | None | None | N | 1 |
| OJEIIDGO_02985 | None | CBM48(23-120)+GH13_11(187-536) | None | None | N | 1 |
| OJEIIDGO_03073 | None | GT2(3-150) | None | None | N | 1 |
| OJEIIDGO_03111 | None | CBM98(232-329)+GH13_47(635-956) | None | None | Y(1-29) | 1 |
| OJEIIDGO_03180 | None | GT51(70-223) | None | None | N | 1 |
| OJEIIDGO_03210 | None | GT2(4-166) | None | None | N | 1 |
| OJEIIDGO_03211 | None | GT4(200-347) | None | None | N | 1 |
| OJEIIDGO_03246 | None | GT32(20-96) | None | None | N | 1 |
| OJEIIDGO_03249 | None | GH73(135-265) | None | None | N | 1 |
| OJEIIDGO_03255 | None | CE20(136-372) | None | None | Y(1-28) | 1 |
| OJEIIDGO_03260 | None | GT4(209-361) | None | None | N | 1 |
| OJEIIDGO_03273 | None | CE4(22-133) | None | None | N | 1 |
| OJEIIDGO_03279 | None | GH109(37-445) | None | None | N | 1 |
| OJEIIDGO_03291 | None | GT2(39-200) | None | None | N | 1 |
| OJEIIDGO_03348 | None | CE11(3-223) | None | None | N | 1 |
| OJEIIDGO_03363 | None | CBM9(433-589) | None | None | Y(1-22) | 1 |
| OJEIIDGO_03408 | None | CBM102(1-76)+GH16_3(262-495)+CBM102(613-749) | None | None | N | 1 |
| OJEIIDGO_03415 | None | GT2(8-181) | None | None | N | 1 |
| OJEIIDGO_03454 | None | GH188(4-151) | None | None | N | 1 |
| OJEIIDGO_03463 | None | GT2(102-226) | None | None | N | 1 |
| OJEIIDGO_03520 | None | GH109(42-190) | None | None | Y(1-23) | 1 |
| OJEIIDGO_03550 | None | GH13_48(58-342) | None | None | Y(1-27) | 1 |
| OJEIIDGO_03577 | None | CE13(57-237) | None | None | N | 1 |
| OJEIIDGO_03642 | None | GH19_1(110-196) | None | None | Y(1-23) | 1 |
| OJEIIDGO_03696 | None | GH73(118-246) | None | None | N | 1 |
| OJEIIDGO_03780 | None | GH177(38-379) | None | None | N | 1 |
| OJEIIDGO_03785 | None | GH179(92-259) | None | None | Y(1-30) | 1 |
| OJEIIDGO_03790 | None | GH43_28(19-289)+CBM32(332-445) | None | None | N | 1 |
| OJEIIDGO_03830 | None | GH102(195-303) | None | None | N | 1 |
| OJEIIDGO_03853 | None | GT2(14-137) | None | None | N | 1 |
| OJEIIDGO_03854 | None | GT2(6-124) | None | None | N | 1 |
| OJEIIDGO_03855 | None | GT4(215-363) | None | None | N | 1 |
| OJEIIDGO_03889 | None | GH13_3(219-437) | None | None | N | 1 |
| OJEIIDGO_03930 | None | GT2(4-189) | None | None | N | 1 |
| OJEIIDGO_03951 | None | GT2(9-168) | None | None | N | 1 |
| OJEIIDGO_03962 | None | CE1(34-237) | None | None | Y(1-19) | 1 |
Visit the CAZy website if you want to know more about enzyme families and classes reported by dbCAN3. Similar to the COGclassifier, we can perform the analysis for all the MAGs, and integrate the data afterwards to achieve an overall comparison.
antiSMASH (antibiotics & Secondary Metabolite Analysis SHell) is a tool that detects and analyzes biosynthetic gene clusters (BGCs) in microbial genomes. These clusters are groups of co-located genes that together encode the machinery to produce secondary metabolites—specialized compounds that are not essential for basic cellular functions.
Analogous to dbCAN3, we will use the webserver to annotate one of the
MAGs recovered by the pipeline. Upload the file (Download and extract
the folder intermediate.tar.gz from the Moodle
page)
intermediate/gbks/MEGAHIT-MetaBAT2-ERR2143759.7.gbk to antiSMASH and wait
for the results.
If it is kind of slow, we have performed this step for you and we stored
the results within the same intermediate.tar.gz file, at
intermediate/antiSMASH/index.html
Proksee is an interactive web-based tool for visualizing, annotating, and analyzing prokaryotic genomes. Using this tool we can visualize genomes as circular or linear maps, annotate them, customize the visualization and export high-level and detailed-oriented figures.
Same as with previous applications, you just need to upload the
annotated genome in FASTA or GenBank format. Here, we are going to
leverage the files produced by Prokka, and hence you just need to upload
the file (Download and extract the folder
intermediate.tar.gz from the Moodle page)
intermediate/gbks/MEGAHIT-MetaBAT2-ERR2143759.7.gbk to Proksee.
You can customize the display, add features, re-annotate the genome among many other functionalities. Unfortunately, it processes only one genome per time. From here it’s up to your creativity to take advantage from this tool.
You should be seeing this example:
For the past tools we have not analyzed just a random MAG, we selected this one given that it’s taxonomic annotation is shared across different samples (from BIgMAG exercise), and therefore it is interesting to establish the similarities/differences among these MAGs to enable a pangenome analyis. In this case, we study the entire gene repertoire of related MAGs (same species or genus); for our analysis we have selected some MAGs based on GTDB classification.
Next, we run the tool Roary that determines the core genome, accessory genome and unique genome. We have pre-computed the results for you following the tutorial presented by the developers of the tools, and now we are going to visualize the results using Phandango.
Drag and drop the files (Download and extract the folderintermediate.tar.gz from the Moodle page)
intermediate/pangenome/JAAUTG01/workshop.newick and
intermediate/pangenome/JAAUTG01/gene_presence_absence.csv
to the Phandango
web server and visualize the results.
It should look like this:
Usually, this analysis is carried out using a reference genome; however,
given that these MAGs are not annotated at species level, and genus
annotation is not informative, we do not count with a reference genome
to explore the pangenome of these MAGs. You can visualize an example
that includes reference genome with the files found at
intermediate/pangenome/example_with_reference.